Finding the Storyteller: Automatic Spoiler Tagging using Linguistic Cues

نویسندگان

  • Sheng Guo
  • Naren Ramakrishnan
چکیده

Given a movie comment, does it contain a spoiler? A spoiler is a comment that, when disclosed, would ruin a surprise or reveal an important plot detail. We study automatic methods to detect comments and reviews that contain spoilers and apply them to reviews from the IMDB (Internet Movie Database) website. We develop topic models, based on Latent Dirichlet Allocation (LDA), but using linguistic dependency information in place of simple features from bag of words (BOW) representations. Experimental results demonstrate the effectiveness of our technique over four movie-comment datasets of different scales.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی

Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...

متن کامل

Tagging Of Speech Acts And Dialogue Games In Spanish Call Home

The Clarity project is devoted to automatic detection and classification of discourse structures in casual, non-task-oriented conversation using shallow, corpus-based methods of analysis. For the Clarity project, we have tagged speech acts and dialogue games in the Call Home Spanish corpus. We have done preliminary cross-level experiments on the relationship of word and speech act n-grams to di...

متن کامل

Automatic Refinement of Linguistic Rules for Tagging

This paper describes an approach to POS tagging based on the automatic refinement of manually written linguistic tagging rules. The refinement was carried out by means of a learning algorithm based on decision trees. The tagging rules work on ambiguity classes: each input word undergoes a morphological analysis and a set of possible tags is returned. The set of tags determines the ambiguity cla...

متن کامل

Automatic Tracking of Obsolescent Segments with Linguistic Cues

This paper deals with the description and the automatic tracking of text segments containing obsolescence in encyclopedia texts. We assume that despite the non-linguistic nature of this phenomenon, discursive cues are relevant to track those segments. For that purpose, we have worked on a corpus which has been manually annotated by experts and on which we have projected automatically tracked cu...

متن کامل

Part-of-Speech Tagging Without Training

The development of the Internet and the World Wide Web can be either a threat to the survival of indigenous languages or an opportunity for their development. The choice between cultural diversity and linguistic uniformity is in our hands and the outcome depends on our capability to devise, design and use tools and techniques for the processing of natural languages. Unfortunately natural langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010